Jack Davis
2025-09-10
Collectibles are a way to invest in a league (through sealed product) or a player (through singles) without needing a large amount of capital.
Cards are the most tradeable of collectibles.
I have a vision in the future of cards being paired with the rights to draft that player in fantasy sports by linking each copy of a card to an ID via a QR code. (I don’t actually think this idea is important as insulin, but a little salesmanship never hurt)
I also have a hunch that value of the collectible market as a whole is a leading indicator of broader economic conditions, because collectibles are convenient stores of value.
Card collecting has gotten more interesting in recent years with the introduction of metal plate cards, and embedded pieces of memorabilia like jerseys and bats in the cards.
Source: https://www.spudart.org/blog/relic-cards-photographed-on-jerseys/
There was a crash in the early 90s called the “Junk Wax Era” where way too many cards were printed relative to the demand.
Demand for junk wax never did pick up again, which is why it’s possible to buy, in 2025, a sealed box of junk wax from 1990-1993 for 30 dollars at an antiques mall or farmer’s market.
Demand for cards in general recovered, just not cards from that era.
In particular, sports cards and related collectibles got really hot in 2020-2021. (Speculation at the time was that people couldn’t experience live sports on TV, so they got their fix in other ways. Personally I can attest that they were people waiting at Walmart to restock the card shelves.)
This is the first talk of the term, so let’s focus on the ‘how’.
First, I got a PDF of a 2024 copy of the Beckett price guide for cards. It contains tens of thousands of card prices.
Each listing has the printed set that it came from in bold at the top. (e.g., 2017 Panini Gold Standard), and the values of the individual cards (in mint condition).
Here we see that any common card from this set is worth between 60 cents and $1.50 USD, depending on retail location.
Any common autographed jersey piece cards not otherwise listed in the names below goes between 3 and 8 dollars. There were 199 pieces of each jersey embedded into the card (COMMON JSY AU p/r 199).
Any of the 269 Ichiro cards are worth between 1.50 USD and 4 USD (Ichiro/269)
This PDF wasn’t searchable, it was embedded as an image, so I used
optical character recognition (OCR) with the tesseract
package in R to get the data into raw text.
However there was an issue: While the original Tesseract software can automatically detect when text is in columns, the R package that wraps Tesseract doesn’t seem to have that option. That could be workable in text processing, except the titles of the card sets aren’t the same font size, so the rows don’t line up and reading row-by-row produces nonsense.
How can we split this into columns? Take a look at the average brightness of the pixels of a page, arranged by column. The downward spikes correspond to the vertical column bars printed on the page.
By splitting the image at the downward spikes, and applying OCR and some light text processing, we get text like this:
Applying data wranging and some subject knowledge, we get a data frame with the card prices and player and card set names.
…and by merging in the WAR (Wins above replacement) data, as well as career data like year and lifetime earnings from an additional pre-scraped source, and we have all the information we need to do a simple analysis.
https://github.com/Neil-Paine-1/MLB-WAR-data-historical
First, let’s do the title trend and compare the price of the cards to the career WAR. Maybe a quantile regression would work better for this. Notice that I used to the Q3 of the higher retail price - that’s because I wanted to reduce the effect of the extremely high priced cards, as well as weed out a lot of the junk that gets printed, especially of popular players.
Also, WAR is on a log scale.
Speaking of printing a lot of popular players, the trend for the number of mentioned cards printed (i.e., ones that aren’t bundled into the ‘common’ category) over WAR is a lot clearer.
We also have the year that the player last played as a predictor of price. I was actually looking for a declining line, which might suggest that older cards about older players are worth more. There didn’t seem to be a trend here.
Look farther out, there’s only a very slight slope downwards that gets reversed when we encounter active players that have played in the last year.
Much more advanced modeling. This could really use a mixed-effects model, which each player name as a random effect.
More data cleaning because I’m losing 10-15% of the data to special cases still.
More data gathering because year printed isn’t diverse enough (only 2016-2020 recorded) to check if older printed cards are more or less valuable.
Data available on UWAGGS Discord and website soon for anyone who wants to pick this up.